-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gix upgrade and optimizations #1081
Conversation
It was fund implementing it and this kind of optimization isn't even present in Without a commit-graph, it takes ~13s to run onefetch on the linux kernel, and with commitgraph it now takes ~3.8s (at the same memory consumption). That's not quite linear scaling, but scaling worth having. The effective parallelism seems to be 5 cores of the 8.5 it could have, but it's easy to forget that there is also other workload that might not parallelize this well. Separate measurements shows that just obtaining commit metrics takes 10.3s without commitgraph, and 1.43s with commitgraph, a 7.2x speedup :).
|
The latest improvements make more data available during commit iteration, which makes some improvements possible. Doing so potentially saves some time, even though it's not visible in practice.
…t commit information. In practice, this change is unlikely to be noticeable, especially since `--no-bots` is not enabled by default. However, if it is, bot commits will now contribute to the most recent commit times and the first commit times, which will help if the only commit is a bot commit. It also helps with allowing an upcoming multi-threaded implementation that leverages the commit-graph which in turn will greatly speed up information retrieval in large repositories.
This is in preparation for creating a multi-threaded version of it.
This is in preparation for creating a multi-threaded version of it.
Please note that if there is a small repo with commitgraph, then there might not be much time at all to compute churn. In case of |
- Extract commit graph traversal logic out of CommitMetrics constructor - Some renaming and simplified logic
- Extrat commit graph traversal logic out of CommitMetrics's constructor - renaming and simplified logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just learned about Git Commit Graph through this PR. Powerful stuff! Glad gitoxide utilizes it.
I tried to refactor the code a bit as the CommitMetrics
constructor was becoming quite overwhelming. If you could please take look to make sure I haven't broken anything 😅 .
I left a few questions for you in the comments.
Without a commit-graph, it takes ~13s to run onefetch on the linux kernel, and with commitgraph it now takes ~3.8s (at the same memory consumption).
That's very impressive!
* unify logic to update number of commits by author. More notes related to the `use_commit_graph()` call: I was thinking about this API for a while, and it's the easiest way to configure this even though it seems redundant. Let me explain, maybe you can find a better solution. In an even more recent version, `use_commit_graph()` doesn't default to `true`, but defaults to the value of `core.commitGraph`, which in turn defaults to true. We, however, always want to use the commitgraph, under the condition that we can use enough threads *and* actually have a commit-graph. Now, one would think that `with_commit_graph` is enough to get what we want, but since `use_commit_graph` generally defaults to `true` under the hood it would make another attempt to open the commit-graph, even though there might not be one. This leads to the commit-graph detection to be run twice in the case there is none. Thus we explicitly set our verdict in `use_commit_graph` to avoid that kind of work. Does it matter, you might ask? Probably not on today's SSDs, a few `stat` calls don't matter, so if you want, you can remove the `use_commit_graph` call. By default, I want an API that allows to avoid any waste, which is why it's implemented that way. ---- More notes related to the overcommit of threads There isn't really a such a thing like exhausting threads, but indeed this implementation will overcommit a little bit for N threads: * N threads for author computation (author) * 1 thread for commit-iteration and author aggregation (aggregator) * 1 threads for churn computation (churn) The aggregator thread has to wait for (author) threads and won't be busy doing its work - thanks to the commit-graph these walks are very fast. It won't actually consume much CPU. The churn thread is the one that runs just as hot if not hotter than the author threads, and I intend to let these contend a little as typically a little bit of over-commit eeks out a little more performance in practice. This is due to IO still playing a role here and once a thread is blocked, it's good to have a choice of more threads to run. Further, the `churn` thread is operating on a best-effort basis so to me its fine to drown it out a little, if the OS decides to do that, but in turn it gets a little additional time while the author threads are joined, so it should even out. Fearless concurrency is fun :).
The latest improvements make more data available during commit iteration,
which makes some improvements possible.
Please note that CI won't be green for a while until
gix
v0.46 was actually released.Tasks
gix
v0.46